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Cooperative  Solutions  in  Multi-Person  Quadratic  Decision  Problems: 
Finite-Horizon  and  State-Feedback  Cost-Cumulant  Control  Paradigm 


Khanh  D.  Pham 
Space  Vehicles  Directorate 
Air  Force  Research  Laboratory 
Kirtland  AFB,  NM  87117  U.S.A. 


Abstract — In  the  cooperative  cost-eumulant  control  regime 
for  the  class  of  multi-person  single-objective  decision  problems 
characterized  by  quadratic  random  costs  and  state-feedback 
information  structures,  individual  decision  makers  share  state 
information  with  their  neighbors  and  then  autonomously  deter¬ 
mine  decision  strategies  to  achieve  the  desired  goal  of  the  group 
which  is  a  minimization  of  a  finite  linear  combination  of  the 
first  k  cost  cumulants  of  a  finite-horizon  integral  quadratic  cost 
associated  with  a  linear  stochastic  system.  Since  this  problem 
formulation  is  parameterized  by  the  number  of  cost  cumulants, 
the  scalar  coefficients  in  the  linear  combination  and  the  group 
of  decision  makers,  it  may  be  viewed  both  as  a  generalization  of 
linear- quadratic  Gaussian  control,  when  the  first  cost  cumulant 
is  minimized  by  a  single  decision  maker  and  of  the  problem  class 
of  linear-quadratic  identical-goal  stochastic  games  when  the 
first  cost  cumulant  is  minimized  by  multiple  decision  makers. 
Using  a  more  direct  dynamic  programming  approach  to  the 
resultant  cost-cumulant  initial-cost  problem,  it  is  shown  that 
the  decision  laws  associated  with  multiple  persons  are  linear 
and  are  found  as  the  unique  solutions  of  the  set  of  coupled  dif¬ 
ferential  matrix  Riccati  equations,  whose  solvability  guarantees 
the  existence  of  the  closed-loop  feedback  decision  laws  for  the 
corresponding  multi-person  single-objective  decision  problem. 

I.  Introduction 

Cooperative  control  involves  the  control  of  a  group  of 
entities  that  are  working  collectively  and  efficiently  to  solve 
a  problem  or  meet  a  common  objective.  This  is  an  emerging 
area  of  research  with  widespread  applications  to  problems  in 
several  engineering  disciplines  and  economic  analysis.  Now, 
within  the  context  of  performance  analysis  of  cooperative 
systems,  decision  laws  of  cooperative  decision  makers  are 
adjusted  repeatedly  until  a  desired  response  is  reached.  It  is 
not  at  all  clear  how  each  decision  laws  of  these  decision 
makers  affect  the  global  closed-loop  response  of  a  total 
system.  There  have  been  a  number  of  attempts  to  evaluate 
the  performance  of  a  stochastic  system  using  the  average 
and  variance  of  the  associated  performance  measure.  In  case 
the  performance  measure  is  normally  distributed,  these  two 
stochastic  moments  are  sufficient  for  a  full  characterization 
of  the  probability  distribution.  However,  this  is  not  always 
the  case.  It  turns  out  that  in  order  to  generalize  the  results  for 
second-order  statistics  to  higher-order  statistics,  it  is  better  to 
consider  higher-order  cumulants,  not  higher-order  moments. 
One  of  main  results  of  the  paper  shows  that  the  higher-order 
cumulants  of  finite-horizon  integral  quadratic  form  cost  can 
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be  obtained  directly  from  the  cumulant-generation  equation. 
The  result  relies  heavily  on  the  space-space  representation 
of  the  class  of  linear-quadratic  decision  problems.  The  other 
important  result  consists  of  optimal  decision  laws  associated 
with  a  group  of  decision  makers  which  simultaneously  affect 
the  performance  of  the  cooperative  system  via  a  complete 
statistical  description.  The  paper  is  structured  as  follows.  The 
next  section  prepares  the  necessary  background  in  generating 
higher-order  statistics  which  are  then  used  to  formulate  the 
cost-cumulant  control  problem  for  multiple  decision  makers. 
A  precise  mathematical  formulation  involving  problem  state¬ 
ments  of  the  multi-person  single-objective  decision  problem 
is  given  next.  Finally,  the  optimal  decision  laws  are  presented 
in  the  last  section  with  concluding  remarks. 

II.  Performance-Measure  Statistics 

Let’s  consider  a  stochastic  decision  problem  with  N  co¬ 
operative  decision  makers,  identified  as  u\, . . . ,  un-  Suppose 
(to,Xo)  £  [tojtf]  x  R™  is  fixed.  An  input  noise  w(t)  = 
w(t,u)  :  [to,tf]  x  fi  ^4  Rf  is  an  p-dimensional  stationary 
Wiener  process  defined  with  {Tt  }/>n  being  its  filtration  on 
a  complete  filtered  probability  space  (f 1, 77  { J7r}t>o ,  V)  over 

with  the  correlation  of  increments 

E  {[w(r)  —  w(£)]['u;(t)  —  tn(£)]T}  =  W\t  —  £|,  W  >  0. 

Furthermore,  decision  sets  Ui  £  L ^  (fi;  C([to,  tf\;  Rmi))  and 
i  =  1 , ,N  are  assumed  to  be  the  subsets  of  Hilbert  space 
of  Rmi -valued  square  integrable  processes  on  [ i0,f/ }  that  are 
adapted  to  the  er-field  Tt  generated  by  i u(t),  respectively. 
Associated  with  each  (u-\ , . . . ,  un)  £  U\  x  •  •  •  x  Un  is  a 
common  finite-horizon  integral  quadratic  form  (IQF)  payoff 
functional  ./  :  [to,tf]  x  R™  x  U\  x  •  •  •  x  Un  >  R+  such  that 


J(t0 ,  x0;  ui,  ■  •  • ,  uN)  =  xT{tf)Qsx(tf) 


rU 


'to 


N 


xt(t)Q(t)x(t)  +  '^2uf(T)Ri(T)ui(T) 


dr  (1) 


where  the  states  of  the  decision  problem,  x{t)  = 
x(t,u)  :  x  O  i — »  R”  belong  to  the  Hilbert  space 

L^t(fl;C([fo,i/];R"))  with  E  j  f*'  xT(T)x{T)dr^  <  oo 
and  evolve  according  to  the  stochastic  differential 


dx(t) 


N 

A(t)x(t)  +  ^  Bi(t)ui(t) 

i= 1 


x(t0)  =  x0 


dt  +  G(t)dw(t ) , 


(2) 


in  which  A  G  C{[t0,tf};Rnxn),  B,  G  C{[tQ,  tf];  Rnxm*)> 
and  G  G  C([to,  tf}',  M"X;P)  are  deterministic  matrix-valued 
functions  together  with  (A.  11, )  uniformly  stabilizable.  The 
terminal  Qf  G  Rraxn,  the  state  Q  G  C([to,  tf];  Mnx”), 
and  the  control  Ri  G  C([to,tf];RmiXmi)  weightings  are 
deterministic  and  positive  semidefinite  with  11,  (1)  invertible. 

In  view  of  the  linear  system  (2)  and  the  quadratic 
performance-measure  (1),  it  is  reasonable  to  assume  that 
cooperative  decision  makers  choose  their  decision  laws  from 
a  class  of  memoryless  perfect-state  strategies,  as  functions  of 
both  time  and  states,  'ji  :  [to,  tf]  x  L ^  (ft;  C([fo,  t/];  Kra))  i— > 
(ft;  C  ( [t0 ,  t/] ;  Mm< ) ) 

Ui(t )  =  7 i(t,  x(t))  =  Ki(t)x(t) ,  (3) 

where  the  admissible  gains  K,  G  C([to,  t /];  Rm;Xn)  are 
defined  in  appropriate  senses.  For  a  given  initial  condition 
(to,%o)  €  [to ,  t /]  x  Kn  and  subject  to  these  strategies  (3), 
the  dynamics  of  the  cooperative  decision  problem  (2)  is  then 
given  by 

N 

dx(t )  =  A(t)  +  ^  Bi(t)Ki(t)  x(t)dt  +  G(t)dw(t) , 

i= 1 

x(t0)  =  x0  ,  (4) 

and  its  IQF  cost  also  follows 

J  (t0,  x0;  Ki,...,  Kn)  =  xT(tf)Qfx(tf) 

rtf  N 

+  /  xt(t )  Q(t)  +  K^(r)Ri(T)Ki(T)  x(r)dr.  (5) 
"'‘o  L  *=i 

It  is  now  necessary  to  develop  a  procedure  for  generating 
cost  cumulants  for  the  multi-person  single-objective  decision 
problem  by  adapting  the  parametric  method  in  [3]  to  charac¬ 
terize  a  moment-generating  function.  These  cost  cumulants 
are  then  used  to  form  the  performance  index  in  the  cost- 
cumulant  control  optimization.  This  approach  begins  with  a 
replacement  of  the  initial  condition  (tg,x o)  by  any  arbitrary 
pair  ( a,xa ).  Thus,  for  the  given  admissible  feedback  gains 
li  t ,  A',v,  the  cost  functional  (5)  is  seen  as  the  “cost- 
to-go”,  ./  (a,  xa).  The  moment-generating  function  of  the 
vector-valued  random  process  (4)  is  given  by 

P  (a,  xa;  9)  =  {exp  (OJ  (a,  xa))}  ,  (6) 

where  the  scalar  9  G  M+  is  a  small  parameter.  Thus,  the 

cumulant-generating  function  immediately  follows 

ip  (ai  xa’,  9)  =  In  {p  (a,  xa;  9)}  ,  (7) 

in  which  ln{-}  denotes  the  natural  logarithmic  transformation 
of  an  enclosed  entity. 

Theorem  1:  Cost-Cumulant  Generating  Function. 

For  all  a  G  [to,i/]  and  the  small  parameter  9  G  R+,  define 

P  {ot,  xa;  9)  =  g  (a,  9)  exp  (x„T(a, 9)xa)  ,  (8) 

v(a,  9)  =  ln{p(a,  9)}  .  (9) 

Then  the  cost-cumulant  generating  function  is  expressed  as 
(a,  xa;  9)  =  x^T(a,  9)xa  +  v  (a,  9)  ,  (10) 


where  the  scalar  solution  v  (a,  9)  with  v  (tf,  9)  =  0  solves 

Ef  [a,  9)  =  Tr  {T(a,  9)G  (a)  WGT  (ct)}  ,  (11) 

and  the  matrix-valued  solution  T (a,  9)  with  T (tf,9)  =  9Qf 
satisfies 


T (a,9)  =  -  A(a)  +  ^2Bi(a)Ki(a)  T (a,  9) 


-  T (a,  9)  A(a)  +  ^  Bi(a)Kz(a) 

i= 1 

-  2T(a,  9)G(a)WGT (a)T (a,  9) 


-9  Q(a)  +  ^K?(a)Ri(a)Ki(a)  .  (12) 

i-l 

In  addition,  the  auxiliary  solution  g(a,  9)  is  satisfying  the 
backward-in-time  differential  equation  with  g(tf,9)  =  1 

-^-q  (a,  9)  =  -g  ( a ,  6>)Tr{T(a,  9)G  (ct)  WGT(a)}  .  (13) 

Proof.  For  any  9  given,  let  w  (a,  xa ;  9)  =  exp  (9  J  (a,  xa)) 
then  the  moment-generating  function  becomes 
ip  (a,  xa\  9)  =  E  {zu  (a,  xa;  0)}  with  the  time  derivative  of 

~j~P  {a,  xa;  9)  = 
da 

N 

—p  (a,xa;9)  9x„  Q(a)  +  ^  Kj'(a)Ri(a)Ki(a)  xa  . 

i— t 

Using  the  standard  Ito’s  formula,  one  gets 

dp  (a,  xa’,  9)  =  E  {dw  (a,  xa;  0)}  , 

=  E^yWa  (a,  xa\  9)  da  +  wXa  (a,  xa\  9)  dxa 

+  ^Tr  {  zuXaXa  (a,  xa\  9)G(a)WGT  (a)}  dctj  , 

=  Pa  (oc,  xa ;  9)  da 

N 

+  pXa(a,xa;9)  A(a)  +  E  Bi(a)Ki(a)  xada 


+  2Tr  {TxcXc  ( a,xa;9)G(a )  WGT  (a)}  da, 

which  with  the  definition  (8)  leads  to 


-  p  (a,  xa ;  9)  9xa  Q(a)  +  ^  Kj (a)Ri(a)Ki(a)  x, 

»= i 

=  '^P  {a,  xa-,  9)  +  p  (a,  xa;  9)  «^-T(a,  9)xa 

g(a,9)  da 

r  r  N  i T 

p(a,xa’,9)<Xa  A(a) +  '^2  Bi(a)Ki(a)  T (a,9)x, 


+  ^qT a(a,9)  A(a)  +  ^ Bi(a)Ki(a)  xa> 

i= i  J  J 

p  (a,  xa\ 9)  |2a;^T(a,  9)G(a)WGT(a)T(a,  9)xa 
+  Tr  {T(a,  9)G(a)WGT (a)}  | . 


To  have  constant  and  quadratic  terms  being  independent  of 
xa,  it  requires  that 


d_ 

da 


T  (a,  6») 


d 

da 


g{a,9) 


N 


A(a)  +  ^2  Bi(a)Ki{a) 


i—l 


T(a,9) 


-T  (a,  9) 


N 


A(a)  +  ^2  Bi(a)Ki(a) 


i—l 

■T  , 


-  2T(a,  0)G  (a)  WG1  (a)  T(a,  0) 


-  e 


N 


Q{a)  +  ^2  Kj(a)Ri{a)Ki{a) 


—q  (a,  6)  Tr  {T(a,  9)G  (a)  WGT  (a)}  , 


where  the  terminal  conditions  T (tf,9)  =  9Qf  and 

g(tf,9)  =  1.  Finally,  the  remaining  backward-in-time  dif¬ 
ferential  equation  satisfied  by  v  ( a ,  9)  is  given  by 


^ -v(a,9 )  =  -Tr{T(a,0)G(a)  WGT(a)}  ,i >(tf,9)  =  0 

which  completes  the  proof. 

Now  it  is  ready  to  generate  cost  cumulants  for  the  multi¬ 
person  decision-making  problem  by  looking  at  a  MacLaurin 
series  expansion  of  the  cumulant-generating  function 


^  QJ 

ip(cx,xa;9)  = 

3=  1  -1' 

~  aw) 

=  2^ 


3=1 


9i 

-r  (14) 
e=o  j- 


in  which  Kj(a,xa)’ s  are  called  the  cost  cumulants.  Notice 
that  the  series  coefficients  can  be  computed  by  using  (10) 


dU) 

Wa 


ip(a,xa;9 ) 


9=0 


„  q(J) 

WAT{a'0) 


e=o 


dU) 

89(A 


’(a,  9) 


(15) 


6=0 


In  view  of  the  results  (14)  and  (15),  cost  cumulants  for  the 
stochastic  decision  problem  can  be  obtained  as 


■j(a,  xa)  =  x70 


dti) 

Wa 


T  (a,  9) 


9=0 


d(A 

d9(A 


j(a,9 ) 


e=o 

(16) 


for  any  finite  1  <  j  <  oo.  For  notational  convenience,  the 
following  definitions  are  needed  in  place 


a  d^A 


6=0 


a  d^A 

;  D(a,j)  =  v(a,0) 


6=0 

(17) 


Theorem  2:  Cost  Cumulants  in  Decision  Problems. 

Let  decision  makers  choose  their  control  strategies 
=  (K!(t)x(t),. . .  ,KN(t)x(t)),  where 
the  dynamics  of  the  multi-person  single-objective  decision 
system  is  governed  by  the  linear  stochastic  differential  equa¬ 
tion  (4)  and  is  associated  with  finite-horizon  IQF  payoff 
functional  (5).  For  k  £  Z+  fixed  and  1  <  r  <  fc,  the  kth- cost 


cumulant  in  the  multi-person  decision-making  problem  can 
be  shown  of  the  form 


Kk(to,x  o;  Ki,..  .,Kn)  =  Xq  H(to,  k)xo  +  D(t0,k) ,  (18) 

in  which  the  cumulant-building  variables  {iT(a,r)}*=1  and 
{D(a,r)}k=i  evaluated  at  a  =  to  satisfy  the  following 
differential  equations  (with  the  dependence  of  H(a,r)  and 
D(a ,  r)  upon  the  admissible  gains  K±, . . . ,  Km  suppressed) 

1  T 


N 


A{a)  +  ^2  Bi(a)Ki{a) 


H(a,  1) 


~H[a,  1) 


N 


A{a )  +  ^2  Bi(a)Ki(a) 


i=  1 


N 


-  Q(a)  -  J2  KI {a)Ri{a)Ki{a) ,  (19) 

and,  for  2  <  r  <  k 


i= 1 


d 

da 


H(a,r )  =  — 


N 


n  T 


1 - 1 


—  H(a,  r) 


2  r! 


A{°)  +  ^ 2  Bi(a)Ki(a) 
i=  1 

N 

A(a)  +  ^2  Bi(a)Ki(a) 


H(a,  r) 


-  Yl  '  ^H(a,s)G(a)WGT(a)H(a,r  -  s) ,  (20) 
'  s\(r  —  5)! 

s= 1  v  7 


together  with  1  <  r  <  k 

4- D(a,r )  =  -Tr \ H(a,r)G(a)WGT (a)}  ,  (21) 

da 

where  the  terminal  conditions  =  Qf,  H(tf,r)  =  0 

for  2  <  r  <  k  and  D(tf,r )  =  0  for  1  <  r  <  k. 

Proof.  The  cost  cumulant  expression  in  (18)  is  readily 
justified  by  using  the  result  (16)  and  the  definitions  (17). 
What  remains  to  show  that  the  solutions  H (a,  r)  and  D(a ,  r) 
for  1  <  r  <  k  indeed  satisfy  the  equations  (19)-(21).  Note 
that  the  equations  (19)-(21)  are  satisfied  by  the  solutions 
H(a,r)  and  D(a,r)  can  be  obtained  by  repeatedly  taking 
the  derivative  with  respect  to  9  of  the  equations  (11)-(12) 
together  with  the  assumption  A(a)  +  ’522=i  Bi(a)Ki(a), 
stable  for  all  a  £  [t,Q.  tf]. 


III.  Problem  Statements 
In  the  subsequent  development,  the  subset  of  symmetric 
matrices  of  the  vector  space  of  all  n  x  n  matrices  with 
real  elements  is  denoted  by  §n.  Now  let  k -tuple  variables 
TL  and  V  be  defined  as  follows  TL(-)  =  (Tfi(-), . . . , 7ffc(0) 
and  'D(-)  =  (D\ (•),..., T>k{'))  where  each  element  Tir  £ 
C1([fo,  tf]’,  Sn)  of  H  and  Dr  £  C1([fo,  tf}\ R)  of  V  have  the 
representations  7Yr(-)  =  H(-,r)  and  Dr(-)  =  D(-,r)  with 
the  right  members  satisfying  the  dynamic  equations  (19)- 
(21)  on  the  horizon  [to,  tf}.  For  notational  simplicity,  the 
following  convenient  mappings  become  necessary 

Tr  :  [to,tf]  x  (S”)fc  xriX"  x  •••  xKmjvXn^Sn 
Gr  ;  \to,tf]  x  (S")fc  i— >■  K 


with  the  actions  given  by 
Ti  (a,  72,72 K2)± 

N 

Y/Bl{a)Kl{a) 

2  =  1 

N 

A(a)  +  Bj(c 

2=1 

N 

-  Q(a)  -  ^2  ( a)Ri(a)Ki(a ), 

and,  for  2  <  r  <  k 
Fr{oi,  72,  K\, . . . ,  Kn)  4 

^  Bi(a)Ki(a) 

2=1 

N 

-4(ct)  +  y^Bj(c 

2=1 

r_1  2r» 

-  V  7t  •  ,.Hs(a)G(a)WGT(a)Hr-s(a), 

s\(r  —  s)\ 

S—  1  v  7 

finally,  for  1  <  r  <  k 

Qr(a,H)  4  -Tr{72r(a)G(a)IVGT(a)}  . 

For  a  compact  formulation,  the  product  mappings  should  be 
established  such  that  T\  x  •  •  •  x  Tk  :  [£oG/]  x  (§n)fc  x 
rixnx.-Xr»x"  (S")fc  and  aix---x0fe  :  [t0,t/]x 
(S")fc  i— >  Kfc  along  with  the  corresponding  notations  T  = 
T\  x  •  •  •  x  Tk  and  Q  =  Q\  x  •  •  •  x  Q^.  Thus,  the  dynamic 
equations  of  motion  (19)-(21)  can  be  rewritten  as  follows 

^H(a)  =  F{a,H{a),Kx{a), . .  .,KN(a)) ,  (22) 

da 

^-V(a)  =  g(a,H(a)) ,  (23) 

da 

where  the  terminal  values  —  'Hf  =  (Q/,  0, . . . ,  0)  and 

%)  =  »/  =  (  0,...,0). 

Note  that  the  product  system  uniquely  determines  72  and 
72  once  the  admissible  feedback  gains  K\ , . . . ,  /f  y  are  spec¬ 
ified.  Hence,  72  and  72  are  considered  as  72(-,  A'i, . . . ,  A'y) 
and  72(-,  Kt , . . . ,  /2  y),  respectively.  The  performance  index 
in  the  cost-cumulant  control  problem  can  now  be  formulated 
in  the  admissible  feedback  gains  K\, . . . ,  Kn- 
Definition  1:  Performance  Index. 

Fix  k  £  Z+  and  the  sequence  h  =  {/q  >  0}*=1  with  Hi  >  0. 
Then  for  the  given  initial  condition  (t0,  xo),  the  performance 
index  (j>o  :  [to,tf]  x  (S")fe  x  Kfc  i— >  K+  of  the  cost-cumulant 
control  is  defined  as 

0o  {to,H(to,  K^,  ■  ■  ■  ,KN),D(to,Ki, . . .  ,Kn))  = 

k  k 

y^MKu  . . . ,  Kn)  =  y  hi  [xq  Hi  (t o,  Tfi , . . . ,  KN)x 0 

;=i  ;=i 

+  Dl(t0,K1,...,KN)\,  (24) 


Hr  (a) 
)Ki(a ) 


Hi(a) 
)Ki(a ) 


where  real  constants  hi  mutually  chosen  by  cooperative 
decision  makers  represent  different  levels  of  influence  as 
they  deem  important  to  the  overall  cost  distribution  and 
symmetric  solutions  {Hi(to,  K\, . . . ,  Km)  >  0}f=1  and 
{72;(f0,  Ki, . . . ,  Kn)  >  0}f=1  evaluated  at  a  =  to  satisfy 
the  equations  (22)-(23). 

For  the  given  terminal  data  (i/,72/,72j),  the  classes 
/Cj  w  -p  , . . .  n  -p  of  admissible  feedback  gains 
may  be  defined  as  follows. 

Definition  2:  Admissible  Feedback  Decision  Strategies. 
Let  the  compact  subsets  K\  C  WniXn,...,KN  C 
r mNxn  ke  the  sets  of  allowable  gain  values.  For  the 
given  k  £  Z+  and  the  sequence  h  =  {hi  >  0}f=1 
with  Hi  >  the  sets  of  admissible  decision  strategies 
K.\  H f,v ,-,n>  ■  ■  ■  -,n  are  assumed  to  be  the  classes 

of  C([fo,0/];KmiXrlj,..’.,C_([fo,f/];KmjvX”)  with  values 
TTi(-)  £  Ki, . . . ,  Kn(-)  £  Kn  for  which  solutions  to  the 
dynamic  equations  of  motion  (22)-(23)  exist  on  the  finite 
horizon  [to.  tf]. 

Definition  3:  Optimization  Problem. 

Suppose  that  k  £  Z+  and  the  sequence  /i  =  {hi  > 
0}Li  with  Hi  >  0  are  fixed.  Then  the  cost-cumulant 
control  optimization  problem  over  [to,tf]  is  given  by  the 
minimization  of  the  performance  index  (24)  for  all  K  \  (•)  £ 

and  subject  to  the 

dynamic  equations  (22)-(23)  for  a  £  [to,tf]- 

Let’s  now  introduce  the  value  function,  V(s,y,Z)  for  the 

decision  problem  starting  at  the  time-states  triple  (e,y,Z). 

Definition  4:  Value  Function. 

The  value  function  V  :  [to,tf]  x  (§")fc  x  Rk  i— >  K+  U  {+oo} 
associated  with  the  Mayer  problem  is  defined  by 


V(e,y,Z)± 


mm 

u,...,KN(-)e> cl 


for  any  (s,y,Z)  £  x  (Sn)fc  x  Rfc. 

Conventionally,  set  V(e,y,Z)  =  oo  when  any  of 
y  z-ui  ■  ■  •  >  z-fi  is  emPty-  The  development  in  the 
sequel  is  motivated  by  the  excellent  treatment  in  [2],  and 
is  intended  to  follow  it  closely.  Unless  otherwise  specified, 
the  dependence  of  trajectory  solutions  72  and  V  on  the 
admissible  gains  K\ , . . . .  K N  is  now  omitted  for  notational 
clarity. 

Theorem  3:  Property  1:  Necessary  Condition. 

The  value  function  evaluated  along  any  trajectory  corre¬ 
sponding  to  a  pair  of  control  strategy  gains  feasible  for  its 
terminal  states  is  a  non-increasing  function  of  time. 

Theorem  4:  Property  2:  Necessary  Condition. 

The  value  function  evaluated  along  any  optimal  trajectory  is 
constant. 

It  is  important  to  note  that  these  properties  are  necessary 
conditions  for  optimality.  The  next  theorem  shows  that  these 
conditions  are  also  sufficient  for  optimality. 

Theorem  5:  Sufficient  Condition. 

Let  W(£,y^Z)  be  an  extended  real-valued  function  defined 
on  [to,tf]  x  (§")fc  xifc  such  that  fV{e,y,  Z)  =  (fo{e,y,  Z). 

Let  tf,  Hf,  7 Of  be  given  terminal  conditions  and  let,  for 
each  trajectory  pair  (72,72)  corresponding  to  the  decision 


strategies  (. K 1} . . . ,  KN)  in  IC1tfHfVf.ll  x  •  •  •  x  AC£)W/)I7/;/i, 
W(a,H(a),T>(a))  be  finite  and  non-increasing  on  [f0,f/]. 

If  (K *, . . . ,  Kf )  are  decision  strategies  in  /C{  w  ^  x 
•••  x  ^tj  H  v  7/  such  that  for  the  corresponding  trajec¬ 
tory  pair  (7 H* ,D*),  W(a,H*(a),D*(a))  is  constant  then 
(K*, . . . ,  Kft)  are  optimal  strategies  and  W(tf,TTf,Df)  = 
VCf.Hf.Vf). 

Corollary  1:  Restriction  of  Decision  Strategies. 

Let  be  optimal  decision  strategies  in 

K,}  o,  -r,  x  •••  x  /C^  w  t,  ,  and  (Ti*  ,T>*)  the  corre- 
sponding  trajectory  pair  of  dynamic  equations 

=  J^(a,H(a),K1(a),....,KN(a)) ,  H(tf) 
^D(a)=g(a,H(a)),  V(tf) . 

Then  the  restriction  of  (K*, . . . ,  K^)  to  [fo,  ct]  are  optimal 
decision  strategies  for  each  control  problem  with  terminal 
conditions  ( a,H*(a),V*(a ))  when  t0  <  a  <  tf. 

Remarks.  Both  necessary  and  sufficient  conditions  implied 
by  these  properties  for  a  control  gain  to  be  optimal  give 
hints  that  one  may  find  a  function  W(e,  y,  Z)  :  [to,tf]  x 
(S”)fe  x!‘hR+  such  that  W(e,y,Z)  =  <j>0(e,y,Z), 
W(e,  y,  Z)  is  constant  on  the  corresponding  trajectory  pair, 
and  W(e,y,Z)  is  non-increasing  on  other  trajectories. 

Note  that  the  value  function  V(e,y,Z)  is  supposed  to  be 
continuously  differentiable  in  (e,  y,  Z).  Formally  speaking, 
the  result  regarding  the  differentiability  of  the  value  function 
adapted  from  [2]  is  stated  as  follows. 

Theorem  6:  Differentiability  of  Value  Function. 

Let  K*(a,  7T,  V),  K$(a,H,V)  ,...,  K*N{a,H,V), 

to(e,y,Z),  and  (H(t0(e,y,Zy,e,y),V(t0(e,y,Zy,e,Z)) 

be  optimal  decision  laws,  an  initial  time  and  initial  states 
for  the  trajectories  of 

=  IF  (a,  H,  K?(a,  K*N(a,  H,  V)) , 

da 

yrv(a)  =  Q(a,H) , 
da 

with  the  terminal  condition  (e,y,Z).  Then,  the  value 
function  V(e,y,Z)  is  differentiable  at  each  point 
at  which  t0(e,y,Z)  and  ? f(t0(e,y,Z);e,y)  and 
T>(to(£,  y,  Z);  e,  Z)  are  differentiable  with  respect  to 

(e,y,z). 

Definition  5:  Playable  Set. 

Let  the  playable  set  Q  be  defined  as  follows 

Q  =  {{e,y,Z)  G  [to,tf]  x  (S")fc  x  Kfc 

such  that ICly^  x  •  •  •  x  IC^yZ:il  ±  o}. 
Theorem  7:  HJB  Equation-Mayer  Problem. 

Let  (s,  y ,  Z)  be  any  interior  point  of  the  playable  set  Q  at 
which  the  value  function  V'(s,  J7,  Z)  is  differentiable.  Then 


V(e,y,Z)  satisfies  the  partial  differential  inequality 

0>!v(e,J>Z) 

+  gve^(y)V(£,  y,  Z)  •  vec(iF(e,  y,Ku...,  KN)) 

+  a4(z)vic’y’zy''ccie{e’y))’ 

for  all  (Ki, . . .  ,Kn)  G  Kt  x  •  •  •  x  . 

If  there  exist  optimal  decision  strategies  (AT} , . . . ,  A{.)  G 
1C\  y  z  x  •  •  •  x  K*y  z.^,  then  the  partial  differential  equa¬ 
tion  of  decision  problems 

(  d 

0=  _  min  _  <—  V(e,y,  Z) 

K1eK1,...,KNeKN  }  Oe 

+  a  ^  V(e,  y,  Z)  •  vec(A(e,  y,Ku...,  KN)) 
ovec(y) 

+  ^  V(e,  y,  Z)  ■  vec(g(£,  y))  1  (25) 

ovec(Z)  j 

is  satisfied  together  with  V(fo,  7Yo,  I^o)  =  </>o(fo,  Wo,  A)) 
and  vec(-)  the  vectorizing  operator  of  enclosed  enti¬ 
ties.  The  optimum  in  (25)  is  achieved  by  the  left  limit 
(ATi  (e)-, . . . ,  K’fr(e)~)  of  the  optimal  strategies  at  e. 

Theorem  8:  Verification  Theorem. 

Fix  k  G  Z+.  Let  W(e,  J7,  Z)  be  a  continuously  differentiable 
solution  of  the  HJB  equation 

0=  _  min  _  \^-V(e,y,Z) 

K1eK1,...,KNGKN  [  oe 

+  gve^(y)V(£,  y,  Z)  •  vec(iF(e,  y,Ku...,  KN)) 

+  atkz}v^y-zy^9is’y»} 

and  satisfy  the  boundary  condition  for  all  (fo,  Wo,  Aj)  G  A4 

W(fo,  Wo,  A))  =  <Ao  (^o,  Wo,  A))  ,  (26) 

where  M  =  {fo}  x  (Sn)fc  x  Rfc. 

Let  (fy,  W/,  Vf)  be  a  point  of  Q,  (Ki, . . . ,  Kjy)  control 
strategies  in  F]j  Uj  Vf.lt  x  •  •  •  x  and  TL  and  V 

the  corresponding  solutions  of  the  equations 

^-H{a)  =  Jr(a,7f(a),A'i(a), . .  .,KN(a)) ,  H(tf) 
-^-D(a)  =  Q(a,  H(a)) ,  V(tf) . 

Then  W(a,H(a),V(a))  is  a  non-increasing  function  of  a. 
If  (Ki,  • . . ,  K*n)  are  control  strategies  in  K,)  H  ^  x •  •  •  x 
IC?f  Uf  x>f  n  defined  on  [fo,f/]  with  corresponding  solution. 


77*  and  V*  of  the  above  equations  such  that  for  a  £  [to,tf] 


0=  ^-W(a,H*(a),V*(a)) 

Os 


d 


d 


dvec(Z) 


dvec(y) 

vec . . . ,  K*N(a))) 

W(a,  77*  (a) ,  V*  (a))vec  (G(a,  77*  (a)))  (27) 


then  (Kl,...,K*N)  are  optimal  decision  strategies  in 
X  '  '  '  X  alld 

W(e,y,Z)=V(e,y,Z),  (28) 


where  V(s,y,Z)  is  the  value  function. 

It  is  observed  that  to  have  the  an  optimal  solution  along 
with  the  decision  laws  ( K *, . . . ,  K*N )  £  K?tf  H  v  x  •  •  •  x 
JC"  Uf  x>fn  well  defined  and  continuous  for  all  a  £  [to,tf\, 
the  solution  77(a)  to  the  equation  (22)  when  evaluated  at 
a  =  to  must  then  exist.  Therefore,  it  is  necessary  that  77(a) 
is  finite  for  all  a  £  [fo,f/).  Moreover,  the  solution  of  the 
equation  (22)  exists  and  is  continuously  differentiable  in  a 
neighborhood  of  tf.  Applying  the  results  from  [1],  these 
solutions  can  further  be  extended  to  the  left  of  tf  as  long 
as  77(a)  remains  finite.  Hence,  the  existences  of  unique  and 
continuously  differentiable  solutions  to  the  equation  (22)  are 
certain  if  77(a)  are  bounded  for  all  a  £  [to,  tf).  As  the  result, 
the  candidate  value  functions  V(a,77,77)  are  continuously 
differentiable  as  well.  The  following  theorem  is  proven. 

Theorem  9:  Necessary  and  Sufficient  Conditions. 

(A'J, . . . ,  K*n)  are  optimal  strategies  if  and  only  if  77(a)  is 
bounded  for  all  a  £  [ to,tf ). 


is  differentiable.  The  parametric  functions  of  time  £[  £ 
C1([to,  7/];  Sn)  and  %  £  C1([to,  t/];  K)  are  yet  to  be  de¬ 
termined.  Furthermore,  the  time  derivative  of  VV(c,  J7  Z)  is 

j :Me,y,Z )  =  (si(e,y)  +  -^(e)) 

i=i  '  ' 

+  xlY,  W  (rfa  y,Ku...,  Kn)  +  ^£,(e))  *o-  (30) 

The  substitution  of  this  hypothesized  solution  (29)  into  the 
HJB  equation  (25)  and  making  use  of  the  result  (30)  yields 

0=  _  min  _  \  -^-]y(e,y,Z) 

K1eK1,...,KNGKN  [  os 

+  y,  Z)  ■  vec  (T(s,  y,Ku...,  KN)) 

=  _  min  _  <  Xq  (  V'  Hi~r^i(s)  )  %o 

K1eK1,...,KNeKN  [  as  J 

k  i  k 

+ + y 

i=i  i=i 

+  x0  (31) 

Differentiating  the  expression  within  the  bracket  of  (31)  with 
respect  to  Kt , . . . ,  Kn  yield  the  necessary  conditions  for  an 
extremum  of  the  performance  index  (24)  on  [to ,  e] , 

k 

-2 Bf  (e)  ^2  wDWi o  -  2fHR1(s)K1M0  =  0 , 

i=i 


IV.  Strategies  by  Cooperative  Decision  Makers 


Recall  that  the  optimization  problem  being  considered 
herein  is  in  “Mayer  form”  and  can  be  solved  by  applying 
an  adaptation  of  the  Mayer  form  verification  theorem  of 
dynamic  programming  given  in  [2],  In  the  framework  of 
dynamic  programming,  it  is  often  required  to  denote  the 
terminal  time  and  states  of  a  family  of  optimization  problems 
as  (e,  y,  Z)  rather  than  (7/,  77/,  27/).  That  is,  for  s  £  [to,  7/] 
and  1  <  l  <  k,  the  states  of  the  system  (22)-(23)  defined 
on  the  interval  [t0 ,  e]  have  the  terminal  values  denoted  by 
77(c)  =  y  and  T>(s)  =  Z.  Since  the  performance  index 
(24)  is  quadratic  affine  in  terms  of  arbitrarily  fixed  xq,  this 
observation  suggests  a  solution  to  the  HJB  equation  (25)  may 
be  of  the  form  as  follows. 

Theorem  10:  Candidate  Value-Function. 

Fix  k  £  Z+  and  let  (e,  y,  Z)  be  any  interior  point  of  the 
reachable  set  Q  at  which  the  real-valued  function 

W(e,y,Z)  = 

k  k 

*o  Y wQ>!  +  500)  *0  +  5]  M  +  Ms))  (29) 

1=1  1=1 


k 

—2Bn(s)  5]  —  2hiRn(s)KnMo  =  0 . 

i=i 


Because  Mq  is  an  arbitrary  rank-one  matrix,  it  must  be  true 

k 

Ki{s,y,  Z)  =  -R^\s)Bx (e)  £  %yr  ,  (32) 

r= 1 


k 

KN(s,y,Z)  =  - R-N\s)BTN(s)Y^ryr  ,  (33) 

r=  1 

where  ) ur  =  fii/fti  for  /ii  >  0.  Substituting  the  gain 
expressions  (32)  and  (33)  into  the  right  member  of  the  HJB 
equation  (31)  yields  the  value  of  the  minimum 


K  ,  K  K 

-  AT(s)Y^yi  ~Y^iA^ 


k 

—  fliQ(s)  +5]  AC  34 

r= 1 


"  N 

Yb.{s)R-1{s)BJ{s) 

_i=l 


1=1 


1=1 


N 


/il  ^  ^  f^r^r 


r=  1 

fc  /-I 


-E*E 

/— 2  q 
k 


'-riqKi-QV  q 


y  v 

k 

S—  1 

^G(£)^Gr(5)^-9 


S=1 

■  A/- 

.i=l 

2?! 


+  5]  W^OO  -  E  /<iTr  {yiG(s)WGT(e)}  .  (34) 


;= l 


i— i 


It  is  now  necessary  to  exhibit  {£p(-)}p=i  and  {'^(■)}p=i 
which  render  the  left  side  of  (34)  equal  to  zero  for  e  £ 
[to,tf],  when  {yp}p-i  are  evaluated  along  solution  trajec¬ 
tories.  Studying  the  expression  (34)  reveals  that  £p( •)  and 
Tp(-)  for  1  <  p  <  k  satisfying  the  differential  equations 


-—£i(s)  —  AT  (e)Tt\(e)  +  TLi{£)A{e)  +  Q(e) 
de 


—  Tt  i(e) 


N 


YJB.l{e)R-\e)Bj{e) 


^  '  ds'Rsi.E-') 


-  E^Wr(e) 


AT 


Wi(e) 


+  /Ur7ir(e) 


r— 1 


A/- 


E  Bl(£)R~1(s)B'[(£) 


Li— 1 


Y.ts'Hsie)  (35) 


S=1 


and,  for  2  <  p  <  k 


£P(e)  =  AT  (e)Hp(e)  +  Hp{£)A{e) 


-np(e) 


N 


YJBl{e)R-\e)Bj{e) 


,i= 1 


^2fisHs(e) 


-  ^  Mr'Hr(e) 

2p! 


r— 1 
P-1 


AT 


,i=l 


WP(e) 


+  E 


nq (e)  G(e) WGt {e)Hv-q (e) ,  (36) 


together  with,  for  1  <  p  <  k 

^Tp(e)  =  Tr{np{e)G{e)WGT{e)}  (37) 

will  work.  Furthermore,  at  the  boundary  condition,  it  is 
necessary  to  have  W  (to,  Ho, 'Do)  =  <j>o  (to,  'H-o,  'Do)-  Or, 
equivalently 


k  k 

xo  ^2  M^o  +  £z(*o))  +  ^2  +  W)  = 

Z=1  Z=1 

fe  fe 

x0  ^  • 
Z=1  Z=1 

Thus,  matching  the  boundary  condition  yields  the  corre¬ 
sponding  initial  value  conditions  £p(to)  =  0  and  Tp(to)  =  0 
for  the  equations  (35)-(37).  Applying  the  feedback  gain 


specified  in  (32)  and  (33)  along  the  solution  trajectories  of 
the  equations  (22)-(23),  these  equations  become 


~rH i(e)  =  -AT(e)Hi(£)  -  Hi{e)A{e)  -  Q(e) 
as 
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and,  for  2  <  p  <  k 


feBpis)  =  -At(£)Hp(£)  -  Hp(e)A(e) 
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together  with,  for  1  <  p  <  k 


_  Hq{£)G{E)WG1  ( e)Hp-q(e ) ,  (39) 


^pp{£)  =  -Tr  {Hp(£)G(£)lFGT(e)}  (40) 

where  the  terminal  conditions  Hi(tf)  =  Qf,  ' Hp(tf )  =  0 
for  2  <  p  <  k  and  Vp(tf)  =  0  for  1  <  p  <  fc.  Thus,  when¬ 
ever  these  equations  (38)-(40)  admit  solutions  {Hp(-)}p= x 
and  {Dp(-)}p=1,  then  the  existence  of  {£P(-)}p-i  and 
{^p(‘)}p=t  satisfying  the  equations  (35)-(37)  are  assured. 
By  comparing  equations  (35)-(37)  to  those  of  (38)-(40),  one 
may  recognize  that  these  sets  of  equations  are  related  to  one 
another  by  ££p(e)  =  -^Wp(e)  and  fsTp{e)  =  ~j-Vp{£) 
for  1  <  p  <  k.  Enforcing  the  initial  value  conditions 
of  £p(to)  =  0  and  Tp(t,0)  =  0  uniquely  implies  that 
£P(e)  =  Rp(to)  -  Bp(e)  and  Tp(s)  =  Vp(t0)  -  VP(£)  for 
all  £  £  [t0,  tf]  and  yields  a  value  function 


W(e,y,Z)=V(e,y,Z) 

k  k 

=  Xq  ^2  o  +  ^2  ViVifo) » 

1=1  1=1 

for  which  the  sufficient  condition  (27)  of  the  verification 
theorem  is  satisfied.  Therefore,  the  optimal  decision  laws  for 
the  decision  maker  1,  (32)  and  the  decision  maker  N,  (33) 
minimizing  the  performance  index  stated  in  (24)  become 

k 

K{{£)  =  -R1\e)BI (e)  Y  MrW*(e) ,  (41) 

r=l 


k 

K*n(£)  =  -R~n\£)BTn{£)  Y  ArK(£)  .  (42) 

r= 1 


Theorem  11:  Multi-Cumulant  Cooperative  Strategies. 
Consider  the  multi-person  linear-quadratic  differential  sys¬ 
tem  (4)-(5)  whose  pairs  {A,  Bi), . . . ,  ( A ,  Bn)  are  uniformly 
stabilizable  on  [to,tf].  Let  k  £  Z+  and  the  sequence  fi  = 
{Hi  >  0}f=1  with  /ii  >  0.  Then  the  optimal  decision  laws 
are  achieved  by  the  cooperative  feedback  control  gains 

k 

K*(a)  =  -Rp(a)Bf (a)  Y  %-H*{a) ,  (43) 

r=  1 

k 

K*N{a)  =  ~R-N\a)BTN(a)  Y  jirH*{a) ,  (44) 

r=  1 

where  /)r  =  /q//z i  mutually  chosen  by  cooperative  decision 
makers  represent  different  levels  of  influence  as  they  deem 
important  to  the  overall  cost  distribution  and  {Ti*(a)  > 
0}*U  are  the  optimal  solutions  of  the  backward-in-time 
coupled  Riccati-type  differential  equations 

1  T 
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+  YJBi{a)Kt{a) 

i= 1 
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-  Q(a)  -  J2  I<*T(a)Rt  (a)K*(a) ,  (45) 
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and,  for  2  <  r  <  k 
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r_1  2H 
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with  the  terminal  boundary  conditions  7 i\(tf)  =  Qf,  and 
=  0  when  2  <  r  <  k. 

In  a  situation  where  cooperative  decision  makers  not  only 
minimize  the  overall  performance  index  of  a  total  system  but 
also,  at  the  same  time  ensure  that  the  closed-loop  poles  lie  to 
the  left  of  a  line  Re(ju>)  =  —a,  for  a  prescribed  o  £  R+.  The 
advantages  offered  by  this  additional  consideration  include 
system  robustness  against  variations  of  system  parameters 
as  well  as  tolerances  of  time  delay  and  nonlinearities  in  the 
closed  loop. 

In  place  of  the  original  cost  (1),  the  new  cost  with  a 
prescribed  degree  of  stability  er  >  0  is  given  by 

J(to,x0;u!, . . .  ,uN)  =  xT(tf)Qfe2atfx{tf)  (47) 

rtf  r  A  1 

+  /  xt[t)Q(t )x(t )  +  Y^,  )Ri(r )ui(r )  e2crrdr  . 

Jto  [  i= i 

Intuitively,  the  new  control  optimization  can  be  converted 
to  the  original  optimization  problem  with  some  changes  of 


variables:  xa(t)  =  x(t)eat,  ua(t)  =  u(t)eat,  and  wa(t)  = 
w(t)eat .  The  strategy  solutions  are  summarized  as  follows. 

Theorem  12:  Strategies  with  a  Prescribed  Stability. 
Consider  the  multi-person  linear-quadratic  differential  sys¬ 
tem  (4)-(5)  whose  pairs  (^4,  Bi)  are  uniformly  stabilizable  on 
[to, tf\.  Let  k  £  Z+  and  the  sequence  fi  =  {/i^  >  0}f=1  with 
/ii  >  0.  Then  the  optimal  decision  laws  with  a  prescribed 
degree  of  stability,  cr  >  0  are  achieved  by  cooperative  gains 

k 

KA(a)  =  -Rp(a)Bf  (a)  Y  Mr KAa ) >  (48) 

r=  1 


k 

KAa)  =  - R~N\a)BTN(a )  Y  -  (49) 

r= 1 

where  jlr  =  jii/n  i  represent  different  levels  of  influence 
as  they  deem  important  to  the  overall  cost  distribution  and 
{Tt*  r(a)  >  0}J?=1  are  the  optimal  solutions  of  the  equations 

Bt(a)KZ 

N 

A{a)  +crI  +  Y  Bi(a)K*  i(a) 

i=  1 
7 

-  Q(a)  -  Y  K%(a)Ri(a)Kti(a) ,  (50) 

and,  for  2  <  r  <  k 
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's!(r  — sj! 

s= 1  x  7 

with  the  terminal  boundary  conditions  H*  1  (tf)  =  Qf,  and 
7f*  r(f/)  =  0  when  2  <  r  <  k. 

V.  Conclusions 

The  recent  results  offer  nontrivial  educational  and  peda¬ 
gogical  contributions  as  well  as  performance  analysis  tools 
toward  the  establishment  of  new,  statistically  based  design 
procedures  for  cooperative  decision  problems  and  stochastic 
games.  Their  practicality  may  be  found  in  network-enabled 
collaborative  systems,  multi-layered  sensing,  and  single  in¬ 
tegrated  situational  awareness  applications. 
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